Extracting pronunciation rules for phonemic variants

نویسندگان

  • Marelie Davel
  • Etienne Barnard
چکیده

Various automated techniques can be used to generalise from phonemic lexicons through the extraction of grapheme-to-phoneme rule sets. These techniques are particularly useful when developing pronunciation models for previously unmodelled languages: a frequent requirement when developing multilingual speech processing systems. However, many of the learning algorithms (such as Dynamically Expanding Context or Default&Refine) experience difficulty in accommodating alternate pronunciations that occur in the training lexicon. In this paper we propose an approach for the incorporation of phonemic variants in a typical instancebased learning algorithm, Default&Refine. We investigate the use of a combined ‘pseudo-phoneme’ associated with a set of ‘generation restriction rules’ to model those phonemes that are consistently realised as two or more variants in the training lexicon. We evaluate the effectiveness of this approach using the Oxford Advanced Learners Dictionary, a publicly available English pronunciation lexicon. We find that phonemic variation exhibits sufficient regularity to be modelled through extracted rules, and that acceptable variants may be underrepresented in the studied lexicon. The proposed method is applicable to many approaches besides the Default&Refine algorithm, and provides a simple but effective technique for including phonemic variants in grapheme-to-phoneme rule extraction frameworks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic generation of Korean pronunciation variants by multistage applications of phonological rules

Phonetic transcriptions are often manually encoded in a pronunciation lexicon. This process is time consuming and requires linguistic expertise. Moreover, it is very difficult to maintain consistency. To handle these problems, we present a model that produces Korean pronunciation variants based on morphophonological analysis. By analyzing phonological variations frequently found in spoken Korea...

متن کامل

Developing consistent pronunciation models for phonemic variants

Pronunciation lexicons often contain pronunciation variants. This can create two problems: It can be difficult to define these variants in an internally consistent way and it can also be difficult to extract generalised grapheme-to-phoneme rule sets from a lexicon containing variants. In this paper we address both these issues by creating ‘pseudo-phonemes’ associated with sets of ‘generation re...

متن کامل

Generation and Selection of Pronunciation Variants for a Flexible Word Recognizer

This paper presents an approach for the generation and selection of pronunciation transcriptions for a exible word recognizer. The basic idea is to produce pronunciation variants and corresponding scores with a set of pronunciation variation rules, which are weighted with their frequencies of occurence measured on the training data. This approach addresses the problem of interfering transcripti...

متن کامل

Statistical Analysis of Korean Pronunciation Variations

In this paper, we present a statistical analysis of Korean pronunciation variations using a Grapheme-to-Phoneme (GTP) system. The GPT system generates pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes in spoken Korean. Experimental results using a PBS (Phonetically Balanced Sentence) Speech DB of 60,000 sentences show that the most...

متن کامل

Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition

In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006